• Linear Regression: Standard linear regression models a relationship between a dependent variable (y) and an independent variable (x) as a straight line:

y = β₀ + β₁x

Where:

β₀ is the intercept.

β₁ is the slope.

  • Introducing the Quadratic Term: Quadratic regression extends linear regression by adding a squared term of the independent variable (x²):

y = β₀ + β₁x + β₂x²

Where:

β₂ is the coefficient of the squared term.

The Curve:

The x² term introduces a curve into the relationship.

If β₂ is positive, the curve opens upward (like a U).

If β₂ is negative, the curve opens downward (like an inverted U).

1 Sheet 1

1.1 What is the relationship between population and IGF revenue performance patterns?

# Descriptive statistics
Cleaned_KMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 11
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 2917182 470947.5 2233000 2549000 2907000 3277000 3630000 ▇▅▅▅▅
Cleaned_KMA_Data %>% skim(IGF)
Data summary
Name Piped data
Number of rows 11
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
IGF 0 1 21745673 5157798 12025624 20538616 22708381 24445521 29377277 ▃▁▆▇▃
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = IGF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
  scale_x_continuous(labels = comma)

# Growth Rate (Percentage)
Cleaned_KMA_Data <- Cleaned_KMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
  )

# Plot of Trends

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = IGF)) +
  geom_point(aes(y = IGF), color = "dodgerblue") +
  labs(title = "IGF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = IGF, color = "IGF")) +
    geom_point(aes(y = IGF, color = "IGF")) +
  labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = comma)

# Growth rate plots
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
    geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
    geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
  labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero

The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 3,500, 000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which stable rise).

1.1.1 Regression Analysis

mod1 <- lm(IGF ~ Population, data = Cleaned_KMA_Data)
summary(mod1)
## 
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6098470 -2859531   180262  2732570  8474201 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)
## (Intercept) 6304153.559 9432943.752   0.668    0.521
## Population        5.293       3.196   1.656    0.132
## 
## Residual standard error: 4760000 on 9 degrees of freedom
## Multiple R-squared:  0.2336, Adjusted R-squared:  0.1484 
## F-statistic: 2.743 on 1 and 9 DF,  p-value: 0.1321
Cleaned_KMA_Data %>%
  ggplot(aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + 
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") + 
  scale_y_continuous(labels = scales::comma)

# The Quadratic Term
Cleaned_KMA_Data$Population_Squared <- Cleaned_KMA_Data$Population^2

#  Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_KMA_Data)

summary(mod_quad)
## 
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3567751 -1933315  -290773  1436917  5015707 
## 
## Coefficients:
##                               Estimate          Std. Error t value Pr(>|t|)   
## (Intercept)        -160892119.79269299   45252401.28199609  -3.555  0.00745 **
## Population                122.28092169         31.44603817   3.889  0.00462 **
## Population_Squared         -0.00001998          0.00000536  -3.728  0.00580 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3051000 on 8 degrees of freedom
## Multiple R-squared:   0.72,  Adjusted R-squared:   0.65 
## F-statistic: 10.29 on 2 and 8 DF,  p-value: 0.006144
ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
  scale_y_continuous(labels = comma)

Linear Regression:

Coefficients:

Intercept: 6,304,153.559

Population: 5.293

P-values: Intercept: 0.521 (not significant) Population: 0.132 (not significant)

R-squared: Multiple R-squared: 0.2336 Adjusted R-squared: 0.1484

Interpretation: The linear model shows a weak and statistically insignificant relationship between population and IGF revenue. Population explains only about 23.36% of the variance in IGF.

Quadratic Regression:

Coefficients: Intercept: -160,892,119.79 Population: 122.28 Population_Squared: -0.00001998

P-values: All coefficients are highly statistically significant (p < 0.01).

R-squared: Multiple R-squared: 0.72 Adjusted R-squared: 0.65

Interpretation: The quadratic model shows a strong and statistically significant relationship between population and IGF revenue. The significant Population_Squared term confirms a non-linear (quadratic) relationship.

The R-squared of 0.72 indicates that the quadratic model explains 72% of the variance in IGF, which is a significant improvement over the linear model.

  • Transformations
# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_KMA_Data) %>% summary()
## 
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30907 -0.15091  0.01315  0.16298  0.37510 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   2.1417     6.6600   0.322   0.7551  
## Ln_Pop        0.9898     0.4477   2.211   0.0544 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2315 on 9 degrees of freedom
## Multiple R-squared:  0.3519, Adjusted R-squared:  0.2799 
## F-statistic: 4.887 on 1 and 9 DF,  p-value: 0.05438
# Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
  geom_point() +
  labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")

After the log transformation the log model showed a stronger relationship than the simple linear model and the relationship is marginally significant.The square root model is better than the simple linear model but not as good as the log model. The quadratic model provided the best fit among the models. The significant Population squared term confirms a non-linear relationship.

  • Checking Regression Assumptions
# Scatter Plot

ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")

# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
  geom_point() + # Added geom_point()
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals(Linear)", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals")

#  Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")

#  Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")

#  Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Quadratic Model)")

#  Durbin-Watson Test (Autocorrelation)
dwtest(mod_quad)
## 
##  Durbin-Watson test
## 
## data:  mod_quad
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
#  Breusch-Pagan Test (Homoscedasticity)
bptest(mod_quad)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad
## BP = 0.88786, df = 2, p-value = 0.6415
#  Variance Inflation Factor (VIF) - Multicollinearity
vif(mod_quad)
##         Population Population_Squared 
##           235.5719           235.5719

Some of the assumptions are aviolated

  • Diagnostics
# Centering Population  Model
Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2

mod_quad_centered <- lm(IGF ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(mod_quad_centered)
## 
## Call:
## lm(formula = IGF ~ Population_Centered + Population_Centered_Squared, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3567751 -1933315  -290773  1436917  5015707 
## 
## Coefficients:
##                                      Estimate        Std. Error t value
## (Intercept)                 25774690.24572124  1419247.31189146  18.161
## Population_Centered                5.69657280        2.05167508   2.777
## Population_Centered_Squared       -0.00001998        0.00000536  -3.728
##                                 Pr(>|t|)    
## (Intercept)                 0.0000000868 ***
## Population_Centered               0.0241 *  
## Population_Centered_Squared       0.0058 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3051000 on 8 degrees of freedom
## Multiple R-squared:   0.72,  Adjusted R-squared:   0.65 
## F-statistic: 10.29 on 2 and 8 DF,  p-value: 0.006144
# Diagnostic Tests on Centered Model
# Residuals vs. Fitted Values (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered), fitted = fitted(mod_quad_centered)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Centered Quadratic Model)", x = "Fitted Values", y = "Residuals")

# Histogram of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Centered Quadratic Model)", x = "Residuals")

# Q-Q Plot of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Centered Quadratic Model)")

# Durbin-Watson Test (Centered)
dwtest(mod_quad_centered)
## 
##  Durbin-Watson test
## 
## data:  mod_quad_centered
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Centered)
bptest(mod_quad_centered)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad_centered
## BP = 0.88786, df = 2, p-value = 0.6415
# VIF (Centered)
vif(mod_quad_centered)
##         Population_Centered Population_Centered_Squared 
##                    1.002787                    1.002787

Therefore from the analysis so far we found a strong, curved relationship between population and IGF revenue. The quadratic model (IGF ~ Population + Population_Squared) is the most appropriate for describing the relationship between Population and IGF it has ( p-value = 0.006144 and Multiple R-squared = 0.72) All the assumptions are met only autocorrelation remains this suggests the model may not fully capture time-related patterns. A larger sample may be able to resolve that.

1.2 What is the relationship between population and DACF revenue performance patterns?

Cleaned_KMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 11
Number of columns 81
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 2917182 470947.5 2233000 2549000 2907000 3277000 3630000 ▇▅▅▅▅
Cleaned_KMA_Data %>% skim(DACF)
Data summary
Name Piped data
Number of rows 11
Number of columns 81
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DACF 0 1 5446273 2142998 2523770 3107713 6274711 7244209 7396115 ▅▂▁▃▇
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population")

ggplot(Cleaned_KMA_Data, aes(x = DACF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")

#Growth Rates and Per Capita Values
Cleaned_KMA_Data <- Cleaned_KMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
  )




# Plotting Trends

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = DACF)) +
  geom_point(aes(y = DACF), color = "dodgerblue") +
  labs(title = "DACF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = DACF, color = "DACF")) +
  geom_point(aes(y = DACF, color = "DACF")) +
  labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = scales::comma)

# Plotting Growth Rates
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
  geom_hline(yintercept = 0, linetype = "dashed", color = "red")

The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which had a stable rise).

1.2.1 Regression Analysis

mod2 <- lm(DACF ~ Population, data = Cleaned_KMA_Data)
summary(mod2)
## 
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3266584 -1367838    20734  1109642  2600396 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)
## (Intercept) -1461779.744  3822900.230  -0.382    0.711
## Population         2.368        1.295   1.828    0.101
## 
## Residual standard error: 1929000 on 9 degrees of freedom
## Multiple R-squared:  0.2708, Adjusted R-squared:  0.1898 
## F-statistic: 3.343 on 1 and 9 DF,  p-value: 0.1008
Cleaned_KMA_Data %>%
  ggplot(aes(x = Population, y = DACF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
  labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
  scale_y_continuous(labels = scales::comma)

#  Quadratic Regression
mod_quad2 <- lm(DACF ~ Population + Population_Squared, data = Cleaned_KMA_Data)

summary(mod_quad2)
## 
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2563570  -863129   234425   991928  1959111 
## 
## Coefficients:
##                               Estimate          Std. Error t value Pr(>|t|)  
## (Intercept)        -52642251.321280584  24234543.657086555  -2.172   0.0616 .
## Population                38.179151978        16.840661785   2.267   0.0531 .
## Population_Squared        -0.000006117         0.000002870  -2.131   0.0657 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1634000 on 8 degrees of freedom
## Multiple R-squared:  0.5349, Adjusted R-squared:  0.4186 
## F-statistic:   4.6 on 2 and 8 DF,  p-value: 0.04681
ggplot(Cleaned_KMA_Data, aes(x = Population, y = DACF)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
  labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and DACF Revenue") +
  scale_y_continuous(labels = comma)

The linear model shows a weak and statistically insignificant relationship between population and IGF revenue. Population explains only about 27.08% of the variance in DACF.

The quadratic model shows a statistically significant relationship between population and DACF revenue. But the individual term and the Population_Squared term are not significant

  • Checking Regression Assumptions
 #Scatter Plot 
ggplot(Cleaned_KMA_Data, aes(x = Population, y = DACF)) +
  geom_point() +
  labs(title = "Population vs. DACF Revenue",
       x = "Population", y = "DACF Revenue")

#  Residual 
ggplot(data = data.frame(residuals = residuals(mod2),
                        fitted = fitted(mod2)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted",
       x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals ")

# Autocorrelation
dwtest(mod2)
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 1.6094, p-value = 0.1371
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)

bptest(mod2)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod2
## BP = 0.000024319, df = 1, p-value = 0.9961
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.


# Multivariate Normality

#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.




#  Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")

#  Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")

#  Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Quadratic Model)")

#  Durbin-Watson Test (Autocorrelation)
dwtest(mod_quad)
## 
##  Durbin-Watson test
## 
## data:  mod_quad
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
#  Breusch-Pagan Test (Homoscedasticity)
bptest(mod_quad)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad
## BP = 0.88786, df = 2, p-value = 0.6415
#  Variance Inflation Factor (VIF) - Multicollinearity
vif(mod_quad)
##         Population Population_Squared 
##           235.5719           235.5719

The scatter plot shows a positive but non-linear relationship. It shows that as population increases DACF revenue tends to increase as well. The histogram plot show a potential violation of the normality assumption. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.

  • Transforming the linear regression
#Transformed Models
lm(log(DACF) ~ log(Population), data = Cleaned_KMA_Data) %>% 
  summary()
# 
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_KMA_Data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.67169 -0.25524 -0.00088  0.23629  0.55431 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)  
# (Intercept)     -10.2034    11.4342  -0.892   0.3954  
# log(Population)   1.7227     0.7687   2.241   0.0517 .
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 0.3974 on 9 degrees of freedom
# Multiple R-squared:  0.3582,  Adjusted R-squared:  0.2869 
# F-statistic: 5.023 on 1 and 9 DF,  p-value: 0.05175
lm( sqrt(DACF)~sqrt(Population), data = Cleaned_KMA_Data ) %>% 
  summary()
# 
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_KMA_Data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -736.39 -294.12  -16.42  253.97  594.30 
# 
# Coefficients:
#                    Estimate Std. Error t value Pr(>|t|)  
# (Intercept)      -1131.5392  1692.4528  -0.669   0.5205  
# sqrt(Population)     2.0065     0.9909   2.025   0.0735 .
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 433.7 on 9 degrees of freedom
# Multiple R-squared:  0.313,   Adjusted R-squared:  0.2366 
# F-statistic:   4.1 on 1 and 9 DF,  p-value: 0.07354
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Log(Population) vs. Log(DACF Revenue)",
       x = "Log(Population)", y = "Log(DACF Revenue)")

ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
    geom_smooth(method = "lm") +
  labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
       x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")

The linear regression results earlier indicated that the relationship between population size and DAGF revenue is not statistically significant. From the log and square root models we did not find statistically significant relationships between Population and DACF Revenue.

  • Diagnostics (quadratic model)
# Centering Population  Model
Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2

mod_quad_centered <- lm(DACF ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(mod_quad_centered)
## 
## Call:
## lm(formula = DACF ~ Population_Centered + Population_Centered_Squared, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2563570  -863129   234425   991928  1959111 
## 
## Coefficients:
##                                      Estimate        Std. Error t value
## (Intercept)                 6679595.937710916  760065.984695505   8.788
## Population_Centered               2.491502754       1.098757369   2.268
## Population_Centered_Squared      -0.000006117       0.000002870  -2.131
##                              Pr(>|t|)    
## (Intercept)                 0.0000221 ***
## Population_Centered            0.0531 .  
## Population_Centered_Squared    0.0657 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1634000 on 8 degrees of freedom
## Multiple R-squared:  0.5349, Adjusted R-squared:  0.4186 
## F-statistic:   4.6 on 2 and 8 DF,  p-value: 0.04681
# Diagnostic Tests on Centered Model
# Residuals vs. Fitted Values (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered), fitted = fitted(mod_quad_centered)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Centered Quadratic Model)", x = "Fitted Values", y = "Residuals")

# Histogram of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Centered Quadratic Model)", x = "Residuals")

# Q-Q Plot of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Centered Quadratic Model)")

# Durbin-Watson Test (Centered)
dwtest(mod_quad_centered)
## 
##  Durbin-Watson test
## 
## data:  mod_quad_centered
## DW = 2.585, p-value = 0.5951
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Centered)
bptest(mod_quad_centered)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad_centered
## BP = 1.8585, df = 2, p-value = 0.3949
# VIF (Centered)
vif(mod_quad_centered)
##         Population_Centered Population_Centered_Squared 
##                    1.002787                    1.002787

The centered quadratic model for DACF shows some evidence of a quadratic relationship with population. The F-statistic is statistically significant (p = 0.04681) for the overall model. But both Population_Centered and Population_Centered_Squared are only marginally significant. All regression assumptions of the Centered Quadratic Model are met.

1.3 What is the relationship between population, recurerent and capital expenditure?

  • Descriptive Statistics
# Calculate descriptive statistics
desc_stats <- Cleaned_KMA_Data %>%
  summarize(
    Population_mean = mean(Population),
    Population_sd = sd(Population),
    Population_min = min(Population),
    Population_max = max(Population),
    Capital_Expenditure_mean = mean(Capital_Expenditure),
    Capital_Expenditure_sd = sd(Capital_Expenditure),
    Capital_Expenditure_min = min(Capital_Expenditure),
    Capital_Expenditure_max = max(Capital_Expenditure),
    Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
    Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
    Recrrent_Expenditure_min = min(Recrrent_Expenditure),
    Recrrent_Expenditure_max = max(Recrrent_Expenditure)
  )


cat("
## Descriptive Statistics

| Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean                   |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation     |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum                |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum                |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
## 
## ## Descriptive Statistics
## 
## | Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean                   | 2,917,182 | 16,386,471 | 17,381,914 |
## | Standard Deviation     | 470,948 | 13,818,549 | 4,197,344 |
## | Minimum                | 2,233,000 | 6,278,840 | 8,979,764 |
## | Maximum                | 3,630,000 | 46,223,724 | 24,001,764 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_KMA_Data, aes(x = Capital_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_KMA_Data, aes(x = Recrrent_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Population Histogram
pop_hist <- ggplot(Cleaned_KMA_Data, aes(x = Population)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Population", x = "Population", y = "Density") +
  scale_x_continuous(labels = comma) 

cap_hist

rec_hist

pop_hist

  • Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

# Calculate Per Capita Values
Cleaned_KMA_Data$Capital_Exp_Per_Capita <- Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Population

# Plotting Trends (Improved)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

# Per Capita Analysis 
average_capita <- mean(Cleaned_KMA_Data$Capital_Exp_Per_Capita)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
  labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma) 

Cleaned_KMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_KMA_Data$Recrrent_Expenditure / Cleaned_KMA_Data$Population
average_rec_capita <- mean(Cleaned_KMA_Data$Recrrent_Exp_Per_Capita)

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
  geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
  labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

1.3.1 Regression Results

mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_KMA_Data)
summary(mod3)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -14004703  -7884170  -5442992   1911261  28795577 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)
## (Intercept) 35476333.333 28140934.259   1.261    0.239
## Population        -6.544        9.534  -0.686    0.510
## 
## Residual standard error: 14200000 on 9 degrees of freedom
## Multiple R-squared:  0.04974,    Adjusted R-squared:  -0.05585 
## F-statistic: 0.4711 on 1 and 9 DF,  p-value: 0.5098
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6389181 -2699416  -268187  2219372  7900222 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)
## (Intercept) 8799112.358 8277023.496   1.063    0.315
## Population        2.942       2.804   1.049    0.321
## 
## Residual standard error: 4176000 on 9 degrees of freedom
## Multiple R-squared:  0.109,  Adjusted R-squared:  0.009972 
## F-statistic: 1.101 on 1 and 9 DF,  p-value: 0.3215
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(mod_cap)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -14004703  -7884170  -5442992   1911261  28795577 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)
## (Intercept) 35476333.333 28140934.259   1.261    0.239
## Population        -6.544        9.534  -0.686    0.510
## 
## Residual standard error: 14200000 on 9 degrees of freedom
## Multiple R-squared:  0.04974,    Adjusted R-squared:  -0.05585 
## F-statistic: 0.4711 on 1 and 9 DF,  p-value: 0.5098
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(mod_rec)
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6389181 -2699416  -268187  2219372  7900222 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)
## (Intercept) 8799112.358 8277023.496   1.063    0.315
## Population        2.942       2.804   1.049    0.321
## 
## Residual standard error: 4176000 on 9 degrees of freedom
## Multiple R-squared:  0.109,  Adjusted R-squared:  0.009972 
## F-statistic: 1.101 on 1 and 9 DF,  p-value: 0.3215
Cleaned_KMA_Data %>% 
  ggplot(aes(x = Population, y = Capital_Expenditure)) +
  geom_point()+
  geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
   scale_y_continuous(labels = scales::comma)

Cleaned_KMA_Data %>%
  ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
  scale_y_continuous(labels = scales::comma)

From the linear regression results, the F-statistics and the p-values are not statistically significant for both . The analysis found no statistically significant linaer relationship between population, recurrent and capital expenditure. The relationship between capital expenditure and population is negative and non-linear but recurrent expenditure is positive and non-linear. Neither capital expenditure nor recurrent expenditure shows a strong or statistically significant relationship with population in their model and the low R-squared values indicate that population is not a good predictor of either type of expenditure. Given the linear models it cannot be concluded that changes in the population reliably predict changes in either of the expenditures, and any observed pattern could likely be due to chance.

  • Checking Regression Assumptions
dwtest(mod_cap)
## 
##  Durbin-Watson test
## 
## data:  mod_cap
## DW = 0.78624, p-value = 0.001835
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_rec)
## 
##  Durbin-Watson test
## 
## data:  mod_rec
## DW = 2.2187, p-value = 0.4983
## alternative hypothesis: true autocorrelation is greater than 0
# Autocorrelation
dwtest(mod3)
## 
##  Durbin-Watson test
## 
## data:  mod3
## DW = 0.9003, p-value = 0.004745
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod_cap)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_cap
## BP = 0.67429, df = 1, p-value = 0.4116
bptest(mod_rec)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_rec
## BP = 3.1297, df = 1, p-value = 0.07688
bptest(mod3)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod3
## BP = 3.9615, df = 1, p-value = 0.04655

From the above tests homoscedasticity is not present and autocorrelation is present, this means the models violate some of the regression assumptions.

  • Transformations
Cleaned_KMA_Data$Ln_Population <- log(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Ln_Capital_Expenditure <- log(Cleaned_KMA_Data$Capital_Expenditure)



#Transformed Models
mod4 <- lm(log(Capital_Expenditure) ~ log(Population), data = Cleaned_KMA_Data) 
summary(mod4)
# 
# Call:
# lm(formula = log(Capital_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -0.8910 -0.4519 -0.2916  0.3138  1.2492 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)
# (Intercept)       32.709     20.986   1.559    0.154
# log(Population)   -1.100      1.411  -0.779    0.456
# 
# Residual standard error: 0.7294 on 9 degrees of freedom
# Multiple R-squared:  0.06324, Adjusted R-squared:  -0.04084 
# F-statistic: 0.6076 on 1 and 9 DF,  p-value: 0.4557
  mod_rec_log <- lm(log(Recrrent_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
  summary(mod_rec_log)
# 
# Call:
# lm(formula = log(Recrrent_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.45074 -0.14941 -0.02363  0.17303  0.45815 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)
# (Intercept)       6.1923     7.4054   0.836    0.425
# log(Population)   0.7024     0.4978   1.411    0.192
# 
# Residual standard error: 0.2574 on 9 degrees of freedom
# Multiple R-squared:  0.1811,  Adjusted R-squared:  0.09015 
# F-statistic: 1.991 on 1 and 9 DF,  p-value: 0.1919
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Log(Population) vs. Log(Capital Expenditure)",
       x = "Log(Population)", y = "Log(Capital Expenditure)")

  ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
       x = "Log(Population)", y = "Log(Recurrent Expenditure)")

After the transformations none of the model is significant. Quadratic models are below.

  • Quadratic model
Cleaned_KMA_Data$Recrrent_Expenditure_squared <- Cleaned_KMA_Data$Recrrent_Expenditure^2

Cleaned_KMA_Data$Capital_Expenditure_squared <- Cleaned_KMA_Data$Capital_Expenditure^2

mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_KMA_Data)

# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -13671517  -6284597   -594718   5275778  20398369 
## 
## Coefficients:
##                               Estimate          Std. Error t value Pr(>|t|)  
## (Intercept)        -370475527.52983952  170044595.95024911  -2.179   0.0610 .
## Population                277.50152126        118.16453279   2.348   0.0468 *
## Population_Squared         -0.00004852          0.00002014  -2.409   0.0426 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11470000 on 8 degrees of freedom
## Multiple R-squared:  0.4492, Adjusted R-squared:  0.3116 
## F-statistic: 3.263 on 2 and 8 DF,  p-value: 0.09201
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4746979 -3030151  -413017  2141307  7879933 
## 
## Coefficients:
##                               Estimate          Std. Error t value Pr(>|t|)
## (Intercept)        -40224855.249450110  63325308.704196945  -0.635    0.543
## Population                37.244339443        44.004959260   0.846    0.422
## Population_Squared        -0.000005859         0.000007500  -0.781    0.457
## 
## Residual standard error: 4270000 on 8 degrees of freedom
## Multiple R-squared:  0.1721, Adjusted R-squared:  -0.03485 
## F-statistic: 0.8316 on 2 and 8 DF,  p-value: 0.4697
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
  scale_y_continuous(labels = comma)

The quadratic model results is significant for capital expenditure compared to the linear and the log models. But the overall p-value is not significant. This means that the relationship between population and capital expenditure is non-linear. For the recurrent expenditure neither the log model nor the quadratic model showed a statistically significant relationship. The relationship between population and capital expenditure is slightly better modeled with a quadratic function but the relationship between population and recurrent expenditure remains unclear.

Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2



# Quadratic Model
cap_exp_quad_mod <- lm(Capital_Expenditure ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(cap_exp_quad_mod)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population_Centered + Population_Centered_Squared, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -13671517  -6284597   -594718   5275778  20398369 
## 
## Coefficients:
##                                      Estimate        Std. Error t value
## (Intercept)                 26168907.58408355  5333094.57326141   4.907
## Population_Centered               -5.56479613        7.70956348  -0.722
## Population_Centered_Squared       -0.00004852        0.00002014  -2.409
##                             Pr(>|t|)   
## (Intercept)                  0.00118 **
## Population_Centered          0.49097   
## Population_Centered_Squared  0.04258 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11470000 on 8 degrees of freedom
## Multiple R-squared:  0.4492, Adjusted R-squared:  0.3116 
## F-statistic: 3.263 on 2 and 8 DF,  p-value: 0.09201
# Capital Expenditure Diagnostics (Quadratic)
# Residuals vs. Fitted
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod), fitted = fitted(cap_exp_quad_mod)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Capital Expenditure - Quadratic)", x = "Fitted Values", y = "Residuals")

# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Capital Expenditure - Quadratic)", x = "Residuals")

# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Capital Expenditure - Quadratic)")

# Durbin-Watson Test
dwtest(cap_exp_quad_mod)
## 
##  Durbin-Watson test
## 
## data:  cap_exp_quad_mod
## DW = 1.2356, p-value = 0.007197
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test
bptest(cap_exp_quad_mod)
## 
##  studentized Breusch-Pagan test
## 
## data:  cap_exp_quad_mod
## BP = 3.4371, df = 2, p-value = 0.1793
# VIF
vif(cap_exp_quad_mod)
##         Population_Centered Population_Centered_Squared 
##                    1.002787                    1.002787

Capital Expenditure Diagnostics (Quadratic) show that all the assumptions are met except autocorrelation, which might be caused by the sample size.

1.4 What is the relationship between revenue growth and infrastructure delivery (Model)

Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).

# Descriptive statistics
Cleaned_KMA_Data %>% skim(Capital_Exp_Per_Capita)
Data summary
Name Piped data
Number of rows 11
Number of columns 87
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Capital_Exp_Per_Capita 0 1 5.85 5.02 1.73 2.6 4.01 7.14 16.76 ▇▁▁▁▁
Cleaned_KMA_Data %>% skim(TtRev_Growth_Rate)
Data summary
Name Piped data
Number of rows 11
Number of columns 87
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TtRev_Growth_Rate 1 0.91 5.33 20.53 -27.39 -9.3 2.22 20.6 40.94 ▂▇▅▇▂
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Capital_Exp_Per_Capita)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_KMA_Data, aes(x = TtRev_Growth_Rate)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
  scale_x_continuous(labels = percent)

# Plotting Trends 

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  labs(
    title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
    x = "Year",
    y = "Total Revenue Growth Rate (%)"  
  ) +
  scale_y_continuous(
    labels = percent_format(scale = 1),  
    sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
  ) +
  scale_color_manual(
    values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
    name = "Type"
  ) +
  theme(axis.title.y.right = element_text(vjust = 2))

The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita( which remained stable).

1.4.1 Regression results

mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_KMA_Data)
summary(mod5)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_KMA_Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.743 -3.264 -2.444  2.431 10.004 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        5.95376    1.79404   3.319   0.0106 *
## TtRev_Growth_Rate  0.03290    0.08883   0.370   0.7207  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.472 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.01686,    Adjusted R-squared:  -0.106 
## F-statistic: 0.1372 on 1 and 8 DF,  p-value: 0.7207
ggplot(Cleaned_KMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
       x = "Total Revenue Growth Rate (%)",
       y = "Capital Expenditure Per Capita")

The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.7207) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.01686) indicates only 1.69% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)

1.5 What is the relationship between expenditure growth and infrastructure delivery?

  • Regression results using expenditure growth (Expenditure_Growth) and infrastructure delivery (capital expenditure per capita).
Cleaned_KMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_KMA_Data$Total_Expenditure) / Cleaned_KMA_Data$Total_Expenditure[-nrow(Cleaned_KMA_Data)]) * 100

mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_KMA_Data)
  summary(mod6)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_KMA_Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.520 -3.215 -2.549  3.080  8.842 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)         5.58243    1.78006   3.136   0.0139 *
## Expenditure_Growth  0.04955    0.05656   0.876   0.4065  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.272 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.08753,    Adjusted R-squared:  -0.02652 
## F-statistic: 0.7674 on 1 and 8 DF,  p-value: 0.4065
  ggplot(Cleaned_KMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
    geom_point() + geom_smooth(method = "lm", se = TRUE)+
    labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
         x = "Expenditure Growth Rate (%)",
         y = "Capital Expenditure Per Capita")

  lm(log(Capital_Exp_Per_Capita) ~ Expenditure_Growth, data = Cleaned_KMA_Data) %>% 
  summary()
## 
## Call:
## lm(formula = log(Capital_Exp_Per_Capita) ~ Expenditure_Growth, 
##     data = Cleaned_KMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9846 -0.5061 -0.2824  0.6932  1.1417 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.423110   0.273628   5.201 0.000822 ***
## Expenditure_Growth 0.008126   0.008695   0.935 0.377336    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8104 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09844,    Adjusted R-squared:  -0.01426 
## F-statistic: 0.8735 on 1 and 8 DF,  p-value: 0.3773

From the linear regression results there is no statistically significant relationship and even after the log transformation the results still remain non-significant.

2 SHEET 2

2.1 What is the relationship between allocative and funding decision-making and revenue patterns?

# no variables

2.2 What is the relationship between allocative decision-making and expenditure patterns?

  • No direct variables are available on this, some descriptive statistics of closely related are below
# Expenditure Composition:
Cleaned_KMA_Data$CapExp_Pct <- (Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Total_Expenditure) 
Cleaned_KMA_Data$CapExp_Rev_Ratio <- (Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Total_Revenue)



# Expenditure Composition 
ggplot(Cleaned_KMA_Data, aes(x = Year, y = CapExp_Pct)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "Capital Expenditure as Percentage of Total Expenditure",
       x = "Year",
       y = "Percentage") +
  scale_y_continuous(labels = percent_format(accuracy = 1))

# Trends of Revenue and Expenditure over the years.

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_point(aes(y = Total_Revenue)) +  # Added aes(y = Total_Revenue)
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_point(aes(y = Total_Expenditure)) +  # Added aes(y = Total_Expenditure)
  labs(title = "Revenue and Expenditure Trends Over Years",
       x = "Year",
       y = "Amount (Ghana Cedis)", color = "Type") +
  scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
  scale_y_continuous(labels = comma) 

ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  labs(
    title = "Revenue and Expenditure Trends Over Years",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red"
    )
  ) +
  scale_y_continuous(labels = scales::comma) +
  theme(
    legend.position = "right", 
    legend.title = element_text(face = "bold"), 
    plot.title = element_text(hjust = 0.5, face = "bold") 
  )

# IGF to Total Expenditure Ratio 
ggplot(Cleaned_KMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) 

# CapExp_Rev_Ratio plot.
ggplot(Cleaned_KMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Capital Expenditure to Total Revenue Ratio Over Years",
    x = "Year",
    y = "Ratio (Capital Expenditure/Total Revenue)"
  ) +
  scale_y_continuous(labels = comma) 

cor.test(Cleaned_KMA_Data$Total_Expenditure, Cleaned_KMA_Data$Total_Revenue)
## 
##  Pearson's product-moment correlation
## 
## data:  Cleaned_KMA_Data$Total_Expenditure and Cleaned_KMA_Data$Total_Revenue
## t = 10.303, df = 9, p-value = 0.000002788
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8495727 0.9898772
## sample estimates:
##       cor 
## 0.9601297

In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2016, followed by a sustained decline. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.

2.3 What is the relationship between population trend, service delivery and revenue and expenditure patterns?

# Revenue Per Capita
Cleaned_KMA_Data$Total_Revenue_Per_Capita <- Cleaned_KMA_Data$Total_Revenue / Cleaned_KMA_Data$Population
Cleaned_KMA_Data$IGF_Per_Capita <- Cleaned_KMA_Data$IGF / Cleaned_KMA_Data$Population
Cleaned_KMA_Data$DACF_Per_Capita <- Cleaned_KMA_Data$DACF / Cleaned_KMA_Data$Population

# Time Series Plots (Improved)

# Total Revenue and Expenditure Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_point(aes(y = IGF, color = "IGF")) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_point(aes(y = DACF, color = "DACF")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  geom_point(aes(y = Others_Sources, color = "Other Sources")) +
  labs(
    title = "Revenue and Expenditure Trends Over Years",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red"
    )
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# Population Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = Population)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Population Trend Over Years",
    x = "Year",
    y = "Population"
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# IGF to Total Expenditure Ratio
ggplot(Cleaned_KMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# Per capita plot
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
  scale_y_continuous(labels = comma) 

cor_matrix <- cor(Cleaned_KMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
##                   Population Total_Revenue Total_Expenditure      IGF_TE
## Population         1.0000000     0.1868282         0.1034625  0.44600059
## Total_Revenue      0.1868282     1.0000000         0.9601297 -0.43561569
## Total_Expenditure  0.1034625     0.9601297         1.0000000 -0.55086584
## IGF_TE             0.4460006    -0.4356157        -0.5508658  1.00000000
## CapExp_Pct        -0.4403760     0.6182895         0.7034446 -0.50888391
## IGF                0.4833200     0.8707321         0.8079852  0.02758598
##                   CapExp_Pct        IGF
## Population        -0.4403760 0.48331996
## Total_Revenue      0.6182895 0.87073207
## Total_Expenditure  0.7034446 0.80798522
## IGF_TE            -0.5088839 0.02758598
## CapExp_Pct         1.0000000 0.44219773
## IGF                0.4421977 1.00000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")

In the above there is a strong positive correlation between total revenue and total expenditure and alo between IGF.

2.3.1 Regression Analysis

# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_KMA_Data)
summary(model_revenue_pop)
## 
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -21596729  -6733191   -491577   5868454  23135274 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)
## (Intercept) 34528437.260 25284085.974   1.366    0.205
## Population         4.887        8.566   0.571    0.582
## 
## Residual standard error: 12760000 on 9 degrees of freedom
## Multiple R-squared:  0.0349, Adjusted R-squared:  -0.07233 
## F-statistic: 0.3255 on 1 and 9 DF,  p-value: 0.5823
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(model_expenditure_pop)
## 
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -23753659  -8681700  -3938558   3341443  27290302 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)
## (Intercept) 40109934.627 31720613.252   1.264    0.238
## Population         3.354       10.747   0.312    0.762
## 
## Residual standard error: 16010000 on 9 degrees of freedom
## Multiple R-squared:  0.0107, Adjusted R-squared:  -0.09922 
## F-statistic: 0.09738 on 1 and 9 DF,  p-value: 0.7621
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_KMA_Data)
summary(model_capital_rev_igf)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11267026  -3857738  -1737562   5764288  12456890 
## 
## Coefficients:
##                     Estimate     Std. Error t value Pr(>|t|)  
## (Intercept)    -4631203.8550  24413618.4027  -0.190   0.8543  
## Total_Revenue         0.7929         0.2434   3.257   0.0116 *
## IGF_TE        -39404383.9422  37091492.8899  -1.062   0.3191  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8535000 on 8 degrees of freedom
## Multiple R-squared:  0.6948, Adjusted R-squared:  0.6185 
## F-statistic: 9.105 on 2 and 8 DF,  p-value: 0.008679
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_KMA_Data)
summary(model_igfte_pop_rev)
## 
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_KMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.092402 -0.036195 -0.002448  0.030562  0.111567 
## 
## Coefficients:
##                      Estimate      Std. Error t value Pr(>|t|)  
## (Intercept)    0.346718175591  0.142208087563   2.438   0.0407 *
## Population     0.000000093807  0.000000044637   2.102   0.0688 .
## Total_Revenue -0.000000003528  0.000000001706  -2.068   0.0725 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06531 on 8 degrees of freedom
## Multiple R-squared:  0.478,  Adjusted R-squared:  0.3474 
## F-statistic: 3.662 on 2 and 8 DF,  p-value: 0.07427
#  Visualizations

# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Total_Revenue)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Total_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_KMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

ggplot(Cleaned_KMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

In the regression results above, we found no significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure and Total Revenue. However in between IGF_TE vs Population and Total Revenue. It was found that Total Revenue was significant.

2.4 What is the relationship between service delivery and revenue and expenditure patterns?

# no variables

2.5 SHEET 3

2.6 SHEET 3